Lecture 01
Dr. Colin Rundel
Pretty much everything we a going to see in this course will fall under the umbrella of either linear or generalized linear models.
In previous classes most of your time has likely been spent with the simple iid case,
\[Y_i = \beta_0 + \beta_1 \, x_{i1} + \cdots + \beta_p \, x_{ip} + \epsilon_i \] \[ \epsilon_i \sim N(0, \sigma^2) \]
these models can also be expressed simply as,
\[ Y_i \overset{iid}{\sim} N(\beta_0 + \beta_1 \, x_{i1} + \cdots + \beta_p \, x_{ip},~ \sigma^2) \]
We can also express using matrix notation as follows,
\[ \begin{aligned} \underset{n \times 1}{\boldsymbol{Y}} = \underset{n \times p}{\boldsymbol{X}} \, \underset{p \times 1}{\boldsymbol{\beta}} + \underset{n \times 1}{\boldsymbol{\epsilon}} \\ \underset{n \times 1}{\boldsymbol{\epsilon}} \sim N(\underset{n \times 1}{\boldsymbol{0}}, \; \sigma^2 \underset{n \times n}{\mathbb{1}_n}) \end{aligned} \]
or alternative as,
\[ \underset{n \times 1}{\boldsymbol{Y}} \sim N\left(\underset{n \times p}{\boldsymbol{X}} \, \underset{p \times 1}{\boldsymbol{\beta}},~ \sigma^2 \underset{n \times n}{\mathbb{1}_n}\right) \]
For an \(n\)-dimension multivate normal distribution with covariance \(\boldsymbol{\Sigma}\) (positive semidefinite) can be written as
\[ \underset{n \times 1}{\boldsymbol{Y}} \sim N(\underset{n \times 1}{\boldsymbol{\mu}}, \; \underset{n \times n}{\boldsymbol{\Sigma}}) \text{ where } \{\boldsymbol{\Sigma}\}_{ij} = \rho_{ij} \sigma_i \sigma_j \]
\[ \begin{pmatrix} Y_1\\ Y_2\\ \vdots\\ Y_n \end{pmatrix} \sim N\left( \begin{pmatrix} \mu_1\\ \mu_2\\ \vdots\\ \mu_n \end{pmatrix}, \, \begin{pmatrix} \rho_{11}\sigma_1\sigma_1 & \rho_{12}\sigma_1\sigma_2 & \cdots & \rho_{1n}\sigma_1\sigma_n \\ \rho_{21}\sigma_2\sigma_1 & \rho_{22}\sigma_2\sigma_2 & \cdots & \rho_{2n}\sigma_2\sigma_n\\ \vdots & \vdots & \ddots & \vdots \\ \rho_{n1}\sigma_n\sigma_1 & \rho_{n2}\sigma_n\sigma_2 & \cdots & \rho_{nn}\sigma_n\sigma_n \\ \end{pmatrix} \right) \]
For the \(n\) dimensional multivate normal given on the last slide, its density is given by
\[ (2\pi)^{-n/2} \; \det(\boldsymbol{\Sigma})^{-1/2} \; \exp{\left(-\frac{1}{2} \underset{1 \times n}{(\boldsymbol{Y}-\boldsymbol{\mu})'} \underset{n \times n}{\boldsymbol{\Sigma}^{-1}} \underset{n \times 1}{(\boldsymbol{Y}-\boldsymbol{\mu})}\right)} \]
and its log density is given by
\[ -\frac{n}{2} \log 2\pi - \frac{1}{2} \log \det(\boldsymbol{\Sigma}) - \frac{1}{2} \underset{1 \times n}{(\boldsymbol{Y}-\boldsymbol{\mu})'} \underset{n \times n}{\boldsymbol{\Sigma}^{-1}} \underset{n \times 1}{(\boldsymbol{Y}-\boldsymbol{\mu})} \]
Lets generate some simulated data where the underlying model is known and see how various regression procedures function.
\[ y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \beta_3 x_{3i} +\epsilon_i \] \[ \beta_0 = 0.7, \quad \beta_1 = 1.5, \quad \beta_2 = -2.2, \quad \beta_3 = 0.1 \] \[ \epsilon_i \sim N(0,1) \]
# A tibble: 100 × 4
`(Intercept)` X1 X2 X3
<dbl> <dbl> <dbl> <dbl>
1 1 1.26 -3.30 0.664
2 1 1.17 -4.88 -2.20
3 1 -0.590 -1.08 -1.95
4 1 0.358 0.319 -0.582
5 1 1.20 -0.314 -2.41
6 1 0.856 -0.733 0.274
7 1 0.649 -0.385 0.112
8 1 -1.58 -0.449 -1.89
9 1 -0.424 1.22 -0.193
10 1 0.0808 1.27 -0.488
# … with 90 more rows
Let \(\hat{\boldsymbol{Y}}\) be our estimate for \(\boldsymbol{Y}\) based on our estimate of \(\boldsymbol{\beta}\), \[ \hat{\boldsymbol{Y}} = \hat{\beta}_0 + \hat{\beta}_1 \, \boldsymbol{X}_{1} + \hat{\beta}_2 \, \boldsymbol{X}_{2} + \hat{\beta}_3 \, \boldsymbol{X}_{3} = \boldsymbol{X}\, \hat{\boldsymbol{\beta}} \]
The least squares estimate, \(\hat{\boldsymbol{\beta}}_{ls}\), is given by \[ \underset{\boldsymbol{\beta}}{\argmin} \sum_{i=1}^n \left( Y_i - \boldsymbol{X}_{i\cdot} \boldsymbol{\beta} \right)^2 \]
Previously we derived, \[ \hat{\boldsymbol{\beta}}_{ls} = (\boldsymbol{X}' \boldsymbol{X})^{-1} \boldsymbol{X}' \, \boldsymbol{Y} \]
List of 2
$ : 'mcmc' num [1:5000, 1:5] 0.607 0.659 0.668 0.829 0.801 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:5] "beta[1]" "beta[2]" "beta[3]" "beta[4]" ...
..- attr(*, "mcpar")= num [1:3] 1001 6000 1
$ : 'mcmc' num [1:5000, 1:5] 0.788 0.769 0.494 0.878 0.566 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:5] "beta[1]" "beta[2]" "beta[3]" "beta[4]" ...
..- attr(*, "mcpar")= num [1:3] 1001 6000 1
- attr(*, "class")= chr "mcmc.list"
Markov Chain Monte Carlo (MCMC) output:
Start = 1001
End = 1007
Thinning interval = 1
beta[1] beta[2] beta[3] beta[4] sigma2
[1,] 0.6073990 1.421458 -2.176067 0.1349446 0.993633
[2,] 0.6593313 1.463185 -2.446135 0.1737123 1.056151
[3,] 0.6680386 1.320887 -2.367218 0.3249702 1.193919
[4,] 0.8287267 1.492184 -2.126412 0.1882640 1.306094
[5,] 0.8008090 1.381371 -2.431426 0.1651416 1.126294
[6,] 0.7237356 1.316525 -2.532359 0.1659811 1.132194
[7,] 0.5368579 1.440870 -2.482804 0.2317600 1.107264
# A tibble: 50,000 × 7
# Groups: i, .variable [5]
i .chain .iteration .draw .variable .value param
<int> <int> <int> <int> <chr> <dbl> <chr>
1 1 1 1 1 beta 0.607 beta[1]
2 1 1 2 2 beta 0.659 beta[1]
3 1 1 3 3 beta 0.668 beta[1]
4 1 1 4 4 beta 0.829 beta[1]
5 1 1 5 5 beta 0.801 beta[1]
6 1 1 6 6 beta 0.724 beta[1]
7 1 1 7 7 beta 0.537 beta[1]
8 1 1 8 8 beta 0.552 beta[1]
9 1 1 9 9 beta 0.760 beta[1]
10 1 1 10 10 beta 0.651 beta[1]
# … with 49,990 more rows
# A tibble: 5 × 4
param truth ols post_mean
<chr> <dbl> <dbl> <dbl>
1 beta[1] 0.7 0.657 0.656
2 beta[2] 1.5 1.47 1.46
3 beta[3] -2.2 -2.28 -2.28
4 beta[4] 0.1 0.163 0.162
5 sigma2 1 1.08 1.14
Likelihood:
\[ \boldsymbol{Y} \,|\, \boldsymbol{\beta}, \, \sigma^2 \sim N(\boldsymbol{X}\boldsymbol{\beta},\, \sigma^2 \, {\mathbb{1}_n}) \]
Priors: \[ \beta_i \sim N(0, \sigma^2_\beta) \text{ or } \boldsymbol{\beta} \sim N(\boldsymbol{0}, \sigma^2_\beta \, {\mathbb{1}_p}) \]
\[ \sigma^2 \sim \text{Inv-Gamma}(a,\,b) \]
\[ \begin{aligned} \left[ \boldsymbol{\beta}, \sigma^2 \,|\, \boldsymbol{Y}, \boldsymbol{X} \right] &= \frac{[\boldsymbol{Y} \,|\, \boldsymbol{X}, \boldsymbol{\beta}, \sigma^2]}{[\boldsymbol{Y} \,|\, \boldsymbol{X}]} [\boldsymbol{\beta}, \sigma^2] \\ &\propto [\boldsymbol{Y} \,|\, \boldsymbol{X}, \boldsymbol{\beta}, \sigma^2] [\boldsymbol{\beta},\,\sigma^2] \\ &\propto [\boldsymbol{Y} \,|\, \boldsymbol{X}, \boldsymbol{\beta}, \sigma^2] [\boldsymbol{\beta}\,|\,\sigma^2] [\sigma^2] \end{aligned} \]
where,
\[ f(\boldsymbol{Y} \,|\, \boldsymbol{X}, \boldsymbol{\beta}, \sigma^2) = \left(2\pi \sigma^2\right)^{-n/2} \exp\left( -\frac{1}{2\sigma^2} (\boldsymbol{Y}-\boldsymbol{X}\boldsymbol{\beta})'(\boldsymbol{Y}-\boldsymbol{X}\boldsymbol{\beta}) \right) \]
\[ f(\boldsymbol{\beta}\,|\, \sigma^2_\beta) = (2\pi \sigma^2_\beta)^{-p/2} \exp\left( -\frac{1}{2\sigma^2_\beta} \boldsymbol{\beta}'\boldsymbol{\beta} \right) \]
\[ f(\sigma^2 \,|\, a,\, b) = \frac{b^a}{\Gamma(a)} (\sigma^2)^{-a-1} \exp\left( -\frac{b}{\sigma^2} \right) \]
\[ \begin{aligned} \left[ \boldsymbol{\beta}, \sigma^2 \,|\, \boldsymbol{Y}, \boldsymbol{X} \right] \propto &\left(2\pi \sigma^2\right)^{-n/2} \exp\left( -\frac{1}{2\sigma^2} (\boldsymbol{Y}-\boldsymbol{X}\boldsymbol{\beta})'(\boldsymbol{Y}-\boldsymbol{X}\boldsymbol{\beta}) \right) \\ &(2\pi \sigma^2_\beta)^{-p/2} \exp\left( -\frac{1}{2\sigma^2_\beta} \boldsymbol{\beta}'\boldsymbol{\beta} \right) \\ &\frac{b^a}{\Gamma(a)} (\sigma^2)^{-a-1} \exp\left( -\frac{b}{\sigma^2} \right) \end{aligned} \]
\[ \begin{aligned} \left[ \boldsymbol{\beta}, \sigma^2 \,|\, \boldsymbol{Y}, \boldsymbol{X} \right] \propto &\left(2\pi \sigma^2\right)^{-n/2} \exp\left( -\frac{1}{2\sigma^2} (\boldsymbol{Y}-\boldsymbol{X}\boldsymbol{\beta})'(\boldsymbol{Y}-\boldsymbol{X}\boldsymbol{\beta}) \right) \\ &(2\pi \sigma^2_\beta)^{-p/2} \exp\left( -\frac{1}{2\sigma^2_\beta} \boldsymbol{\beta}'\boldsymbol{\beta} \right) \\ &\frac{b^a}{\Gamma(a)} (\sigma^2)^{-a-1} \exp\left( -\frac{b}{\sigma^2} \right) \end{aligned} \]
Sta 344 - Fall 2022